متن انگلیسی داده کاوی(مقاله)

Dagstuhl seminar proposal „ Ontologies and Text Mining for Life Science“ 1/5

Ontologies and Text Mining for Life Sciences

Current Status and Future Perspectives

Dagstuhl, 25-28 March 2008

Executive Summary

Keywords: Text Mining, natural language processing, ontologies, ontology design,

machine learning, bioinformatics, medical informatics, knowledge management

1 Introduction

Researchers in Text Mining and researchers active in developing ontological resources

provide solutions to preserve semantic information properly, i.e. in ontologies

and/or fact databases. Researchers from both fields tend to work independently from

each other, but there is a shared interest to profit from ongoing research in the complementary

domain. The relatedness of both domains has led to the idea to organize

a workshop that brings together members of both research domains.

2 The gap between Text Mining and ontologies

Life Science researchers deliver their findings in scientific publications. These documents

are nowadays distributed electronically and increasingly processed by automatic

means to also incorporate those findings and the data into structured, scientific

databases. Methods for this purpose are generally subsumed under the term “Text

Mining”, encompassing techniques belonging to the fields of machine learning, information

retrieval and natural language processing. Text Mining-based solutions have,

for instance, been developed for the identification of protein-protein interactions, of

gene regulatory events, for the functional annotation of proteins, for the identification

and prioritization of disease-related genes, and for the analysis of results from highthroughput

experiments.

Text Mining for the Life Sciences has received considerable interest over the last

years and is now an established area for conferences and workshops (e.g., ISMB,

KDD, ECCB, Coling, ACL, PSB) and has lead to international large-scale challenge

events (KDD-Cup, Genomics track at TREC, BioCreative2&2, BioNLP). The cause

for this interest is the ever increasing amount of publications imposing an unbearable

work burden on the individual researcher and the promising advances in natural language

processing and machine learning that form the solution to the problem, if they

are integrated into biomedical applications.

Text Mining has to cope with a large semantic gap between the raw textual data and

the representation of meaningful results in databases, e.g., normalization of events in

the text to conceptual representations of events according to “textbook” knowledge. It

is hoped that ontologies fill this gap delivering a structured representation of biomedical

knowledge. Although large and increasingly comprehensive biological ontologies

مارسا12